首页> 外文OA文献 >Enhancing Big Data in the Social Sciences with Crowdsourcing: Data Augmentation Practices, Techniques, and Opportunities
【2h】

Enhancing Big Data in the Social Sciences with Crowdsourcing: Data Augmentation Practices, Techniques, and Opportunities

机译:利用众包增强社会科学中的大数据:数据   增强实践,技术和机会

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The importance of big data is a contested topic among social scientists.Proponents claim it will fuel a research revolution, but skeptics challenge itas unreliably measured and decontextualized, with limited utility foraccurately answering social science research questions. We argue that socialscientists need effective tools to quantify big data's measurement error andexpand the contextual information associated with it. Standard research effortsin many fields already pursue these goals through data augmentation, thesystematic assessment of measurement against known quantities and expansion ofextant data by adding new information. Traditionally, these tasks areaccomplished using trained research assistants or specialized algorithms.However, such approaches may not be scalable to big data or appease itsskeptics. We consider a third alternative that may increase the validity andvalue of big data: data augmentation with online crowdsourcing. We presentthree empirical cases to illustrate the strengths and limits of crowdsourcingfor academic research, with a particular eye to how they can be applied to dataaugmentation tasks that will accelerate acceptance of big data among socialscientists. The cases use Amazon Mechanical Turk to 1. verify automated codingof the academic discipline of dissertation committee members, 2. link onlineproduct pages to a book database, and 3. gather data on mental health resourcesat colleges. In light of these cases, we consider the costs and benefits ofaugmenting big data with crowdsourcing marketplaces and provide guidelines onbest practices. We also offer a standardized reporting template that willenhance reproducibility.
机译:大数据的重要性在社会科学家中是一个有争议的话题。支持者声称,大数据将推动一场研究革命,但怀疑者以不可靠的衡量和去上下文化的方式挑战了它,而对于正确回答社会科学研究问题的效用却有限。我们认为,社会科学家需要有效的工具来量化大数据的测量误差并扩展与之相关的上下文信息。许多领域的标准研究工作已经通过数据扩充,对已知量的测量进行系统评估以及通过添加新信息扩展现有数据来实现这些目标。传统上,这些任务是使用训练有素的研究助手或专门的算法来完成的,但是这些方法可能无法扩展到大数据或安抚其怀疑论者。我们考虑了第三种可能会提高大数据的有效性和价值的替代方案:通过在线众包进行数据增强。我们提供了三个经验案例来说明众包在学术研究中的优势和局限性,并特别关注如何将其应用于数据扩充任务,以加快社会科学家对大数据的接受。这些案例使用Amazon Mechanical Turk来1.验证论文委员会成员的学术学科的自动编码,2.将在线产品页面链接到书籍数据库,以及3.收集有关大学心理健康资源的数据。鉴于这些情况,我们考虑了通过众包市场扩展大数据的成本和收益,并提供了最佳实践指南。我们还提供了可增强可重复性的标准化报告模板。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号